Video content analysis for automatic demographics recognition of users and videos

ABSTRACT

A video demographics analysis system selects a training set of videos to use to correlate viewer demographics and video content data. The video demographics analysis system extracts demographic data from viewer profiles related to videos in the training set and creates a set of demographic distributions, and also extracts video data from videos in the training set. The video demographics analysis system correlates the viewer demographics with the video data of videos viewed by that viewer. Using the prediction model produced by the machine learning process, a new video about which there is no a priori knowledge can be associated with a predicted demographic distribution specifying probabilities of the video appealing to different types of people within a given demographic category, such as people of different ages within an age demographic category.

CROSS REFERENCE TO RELATED APPLICATIONS

The application claims the benefit of Provisional Application No.61/147,736, filed on Jan. 27, 2009, which is hereby incorporated hereinby reference.

BACKGROUND

1. Field of Art

The present invention generally relates to the field of digital video,and more specifically, to methods of correlating demographic data withcharacteristics of video content.

2. Background of the Invention

Video hosting sites, such as YouTube or Google Video, currently havemillions of users and tens of millions of videos. Users may sometimeshave difficulty in determining which videos would be of interest tothem, and may be daunted by the sheer volume of videos available forviewing. Thus, the ability to suggest which videos would be of interestto a given user is highly valuable.

However, conventional systems typically merely rely on external metadataassociated with the video, such as keywords or textual videodescriptions, to predict demographics that would be interested in thevideo. For example, conventional systems might recommend videos havingkeywords matching those specified in a viewer profile as being ofinterest to that viewer. However, if the video is new and has not yetbeen viewed and rated, and if the associated title is “spam” thatmisrepresents the true content of the video, then the conventionalapproach produces spurious predictions. Thus, one shortcoming ofconventional approaches is that they rely on external metadata that maybe false when assessing the pertinence of a given video to a particularviewer, rather than examining the actual video content itself.

SUMMARY

A video demographics analysis system creates demographic predictionmodels that predict the demographic characteristics of viewers of avideo, based on quantitative video content data extracted from thevideos.

In one aspect, the system selects a training set of videos to use tocorrelate viewer demographic attributes—such as age and gender—and videocontent data. The video demographics analysis system determines whichviewers have viewed videos in the training set, and extracts demographicdata from the viewer profiles of these viewers. The demographic data caninclude any information describing demographic attributes of theviewers, including but not limited to age, gender, occupation, householdincome, location, interests, and the like. From the extracteddemographic data, the system creates a set of demographic distributionsfor each video in the training set. The video demographics analysissystem also extracts video data from videos in the training set, thevideo data comprising quantitative information on visual and/or audiofeatures of the videos. Then, a machine learning process is applied tocorrelate the viewer demographics for the training set videos with thevideo data of the training set videos , thereby creating a predictionmodel for the training set videos.

In another aspect, the system uses a prediction model produced by themachine learning process to predict, for a video about which there islittle or no prior information about the demographics of viewers, ademographic distribution specifying probabilities of the video appealingto viewers in various different demographic categories, such as viewersof different ages, genders, and so forth. The ability to obtainpredicted demographic distributions for a video has a number of usefulapplications, such as determining a group to which to recommend a newvideo, estimating the demographics of a viewer lacking a reliable userprofile, and recommending videos to a viewer based on the viewer'sdemographic attributes.

In one embodiment, a computer-implemented method of generating aprediction model for videos receives a plurality of videos from a videorepository, each video having an associated list of viewers. For eachvideo, the method creates a demographic distribution for a specifieddemographic based at least in part on user profile data associated withviewers of the video, and generates feature vectors based on the contentof the video. The method further generates a prediction model thatcorrelates the feature vectors for the videos and the demographicdistributions, and stores the generated prediction model.

In one embodiment, a computer-implemented method for determiningdemographics of a video stores a prediction model that correlates viewerdemographic attributes with feature vectors extracted from videos viewedby viewers, wherein the viewer demographic attributes include age andgender. The method further generates from content of the video a set offeature vectors, and uses the trained prediction model to determinelikely demographic attributes of video viewers given that featurevector.

The features and advantages described in the specification are not allinclusive and, in particular, many additional features and advantageswill be apparent to one of ordinary skill in the art in view of thedrawings, specification, and claims. Moreover, it should be noted thatthe language used in the specification has been principally selected forreadability and instructional purposes, and may not have been selectedto delineate or circumscribe the inventive subject matter.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 illustrates the architecture of a video demographics analysissystem, according to one embodiment.

FIG. 2 illustrates the components of a video analysis server, accordingto one embodiment.

FIG. 3 is a flowchart illustrating a high-level view of a process ofperforming the correlation, according to one embodiment.

The figures depict embodiments of the present invention for purposes ofillustration only. One skilled in the art will readily recognize fromthe following description that alternative embodiments of the structuresand methods illustrated herein may be employed without departing fromthe principles of the invention described herein.

DETAILED DESCRIPTION System Architecture

FIG. 1 illustrates the architecture of a system for performing videodemographics analysis of viewer profile information and digital videocontent and correlating demographic and video feature data, according toone embodiment.

As shown in FIG. 1, a video hosting website 100 comprises a front endserver 140, a video serving module 110, an ingest module 115, a videoanalysis server 130, a video search server 145, a video access log 160,a user database 150, and a video database 155. Many conventionalfeatures, such as firewalls, load balancers, application servers,failover servers, site management tools and so forth are not shown so asnot to obscure the features of the system.

Most generally, the video hosting website 100 represents any system thatallows users (equivalently “viewers”) to access video content viasearching and/or browsing interfaces. The sources of videos can be fromuser uploads of videos, searches or crawls of other websites ordatabases of videos, or the like, or any combination thereof. Forexample, in one embodiment a video hosting site 100 can be configured toallow for user uploads of content; in another embodiment a video hostingwebsite 100 can be configured to only obtain videos from other sourcesby crawling such sources or searching such sources in real time. Asuitable website 100 for implementation of the system is the YOUTUBE™website, found at www.youtube.com; other video hosting sites are knownas well, and can be adapted to operate according to the teachingdisclosed herein. It will be understood that the term “web site”represents any computer system adapted to serve content using anyinternetworking protocols, and is not intended to be limited to contentuploaded or downloaded via the Internet or the HTTP protocol. Ingeneral, functions described in one embodiment as being performed on theserver side can also be performed on the client side in otherembodiments if appropriate. In addition, the functionality attributed toa particular component can be performed by different or multiplecomponents operating together.

Each of the various servers and modules is implemented as a serverprogram executing on server-class computer comprising a CPU, memory,network interface, peripheral interfaces, and other well knowncomponents. The computers themselves preferably run an open-sourceoperating system such as LINUX, have generally high performance CPUs, 1G or more of memory, and 100 G or more of disk storage. Of course, othertypes of computers can be used, and it is expected that as more powerfulcomputers are developed in the future, they can be configured inaccordance with the teachings here. The functionality implemented by anyof the elements can be provided from computer program products that arestored in tangible computer accessible storage mediums (e.g., RAM, harddisk, or optical/magnetic media).

A client 170 executes a browser 171 and can connect to the front endserver 140 via a network 180, which is typically the internet, but canalso be any network, including but not limited to any combination of aLAN, a MAN, a WAN, a mobile, wired or wireless network, a privatenetwork, or a virtual private network. While only a single client 170and browser 171 are shown, it is understood that very large numbers(e.g., millions) of clients are supported and can be in communicationwith the video hosting website 100 at any time. The client 170 mayinclude a variety of different computing devices. Examples of clientdevices 170 are personal computers, digital assistants, personal digitalassistants, cellular phones, mobile phones, smart phones or laptopcomputers. As will be obvious to one of ordinary skill in the art, thepresent invention is not limited to the devices listed above.

The browser 171 can include a video player (e.g., Flash™ from AdobeSystems, Inc.), or any other player adapted for the video file formatsused in the video hosting website 100. Alternatively, videos can beaccessed by a standalone program separate from the browser 171. A usercan access a video from the video hosting website 100 by browsing acatalog of videos, conducting searches on keywords, reviewing play listsfrom other users or the system administrator (e.g., collections ofvideos forming channels), or viewing videos associated with particularuser groups (e.g., communities).

Users of clients 170 can also search for videos based on keywords, tagsor other metadata. These requests are received as queries by the frontend server 140 and provided to the video search server 145, which isresponsible for searching the video database 155 for videos that satisfythe user queries. The video search server 145 supports searching on anyfielded data for a video, including its title, description, tags,author, category and so forth.

Users of the clients 170 and browser 171 can upload content to the videohosting website 100 via network 180. The uploaded content can include,for example, video, audio or a combination of video and audio. Theuploaded content is processed by an ingest module 115, which processesthe video for storage in the video database 155. This processing caninclude format conversion (transcoding), compression, metadata tagging,and other data processing. An uploaded content file is associated withthe uploading user, and so the user's account record is updated in theuser database 150 as needed. For purposes of convenience and thedescription of one embodiment, the uploaded content will be referred toas “videos,” “video files,” or “video items,” but no limitation on thetypes of content that can be uploaded is intended by this terminology.Thus, the operations described herein for identifying related items canbe applied to any type of content, not only videos; other suitable typeof content items include audio files (e.g. music, podcasts, audio books,and the like), documents, multimedia presentations, and so forth. Inaddition, related items need not be of the same type. Thus, given avideo, the related items may include one or more audio files, documents,and so forth in addition to other videos.

The video database 155 is used to store the ingested videos. The videodatabase 155 stores video content and associated metadata provided bytheir respective content owners. Each uploaded video is assigned a videoidentifier (id) when it is processed by the ingest module 115. The videofiles have metadata associated with each file such as a video ID,artist, video title, label, genre, time length, and optionallygeo-restrictions that can be used for data collection or contentblocking on a geographic basis. The video files are can be encoded asH.263, H.264, WMV, VC-1 or the like; audio can be encoded as MP3, AAC,or the like. The files can be stored in any suitable container format,such as Flash, AVI, MP4, MPEG-2, RealMedia, DivX and the like.

The video hosting website 100 further comprises viewer profilerepository 105. The viewer profile repository 105 comprises a pluralityof profiles of users/viewers of digital videos, such as the users ofvideo hosting systems such as YouTube™ and Google Video™. A viewerprofile stores demographic information on various attributes of anassociated viewer, such as the viewer's gender, age, location, income,occupation, level of education, stated preferences, and the like. Theinformation may be provided by viewers themselves, when they create aprofile, and can be further supplemented with information extractedautomatically from other sources. For example, one profile entry couldspecify that the viewer was a 24-year-old male, with a collegeeducation, living in Salt Lake City, and with specified interests inarchaeology and tennis. The exact demographic categories stored in theviewer profile can vary in different embodiments, depending on how theprofiles are defined by the system administrator.

The video hosting website 100 further comprises a video access log 160,which stores information describing each access to any video by anyviewer. Thus, each video effectively has an associated list of viewers.Each individual viewer is assigned an ID, for example, based on his orher IP address to differentiate the individual viewers. In oneembodiment, this viewer ID is an anonymized viewer ID that is assignedto each individual viewer to keep viewer identities private, such as anopaque identifier such as a unique random number or a hash value. Thesystem then can access each viewer's demographic information withoutobtaining his or her identity. In an alternative embodiment, the actualidentity of the viewers may be known or determinable. In any case, foreach viewer, the video access log 160 tracks the viewer's interactionswith videos. In one embodiment, each entry in the video access log 160identifies a video being accessed, a time of access, an IP address ofthe viewer, a viewer ID if available, cookies, the viewer's search querythat led to the current access, and data identifying the type ofinteraction with the video. Interaction types can include any viewerinteractions in the viewer interface of the website, such as playing,pausing, rewinding and forwarding a video. The various viewerinteraction types are considered viewer events that are associated witha given video. For example, one entry might store that a viewer at agiven IP address started viewing a particular video at time 0:00:00 andstopped viewing at time 0:34:00.

The video hosting website 100 further comprises a video analysis server130, which correlates demographic information about videos with thecontent of the videos themselves. This involves generating demographicdistributions from demographic data, analyzing video content, andgenerating a prediction model relating the demographic distributions andthe video content. The video analysis module 130 also can predict ademographic distribution for a video and serve demographic queries(e.g., provide information about demographic information across videos).

Referring now to FIG. 2, there are shown the modules in one embodimentof the video analysis module 130. The analysis module 130 comprises ademographics database 210 and a feature vector repository 215, ademographics module 250, a video content analysis module 255, and acorrelation module 260, and additionally comprises a prediction model220.

The demographics database 210 stores data regarding distributions ofdemographic data with respect to videos. For example, certain videos canhave an associated demographic distribution for various demographicattributes of interest, such as age and gender. In some embodiments,distributions are created for combined attributes, such gender-age, e.g.for a given video, that 4% of viewers are females aged 13 to 17. Forinstance, a given video may have an age-gender distribution such as thefollowing:

13-17 18-21 22-25 26-30 . . . Male 5.6 12.3 13.8 8.5 . . . Female 4.08.6 10.2 9.6 . . .

This distribution states that 5.6% of its viewers are male of ages 13 to17, 4% are females of ages 13 to 17, 12.3% are males of ages 18 to 21,and the like. The values in the example distribution representpercentages of the viewers having the corresponding demographiccharacteristics, but they could also be normalized with respect to thegeneral population, e.g. a value of 1.3 for males aged 13-17 indicatingthat 30% more of the viewings were by males aged 13-17 than theirrespective share of the population.

Generally, any demographic attribute stored in a viewer profile may havecorresponding distributions. A given demographic attribute may berepresented at various different levels of granularity, such as 1-year,3-year, or 5-year bins for ages, for example. Similarly, a given videocan have a gender distribution in which 54% of its viewers are female,38% of its viewers are male, and 8% are unknown, where the unknownvalues represent viewers lacking profiles or viewers with profileslacking a value for the gender attribute. As an alternative to storing“unknown” values in the distributions, profiles lacking a value for anattribute of interest could be excluded during training.

In one embodiment, the distributions are represented as vectors, e.g. anarray of integers <0, 6, 11, . . . > where each component represents apreviously assigned age-bin, representing that 0% of viewers are fromages 13 to 17, 6% are 18 to 21, and 11% are 22 to 25. Other storageimplementations would be equally possible to one of skill in the art.

The demographics module 250 takes as input the data in the viewerprofile repository 105 and creates the data on distributions stored inthe demographic database 210. The feature extraction module 255 takes asinput the video data in the video repository 110 and the video accesslog data 160 and extracts feature vectors representing characteristicsof the videos, such as visual and/or audio characteristics, and storesthem in the feature vector repository 215. The correlation module 260performs operations such as regression analysis on the data in thedemographic database 210 and the feature vector repository 215,generating a prediction model 220 that can be, for example, used topredict particular viewer demographics to which a video represented bygiven feature vectors would be of interest. The operations of themodules 250-260 are described in more detail below with respect to FIG.3.

Note that although the various data 210-220 and the modules 250-260 aredepicted as all being located on a single server 130, they could bepartitioned across multiple machines, databases or other storage units,and the like. The data 210-220 could be stored in a variety of mannersas known to one of skill in the art. For example, they could beimplemented as tables of a relational database management system, asindividual binary or text files, etc.

Process of Demographic Correlation

FIG. 3 is a flowchart illustrating a high-level view of a process forperforming the correlation of the correlation module 260, according toone embodiment. First, a training set of videos is selected 305 from thevideo database 155. In some embodiments, the training set is a subset ofthe videos of the video database 155, given that analyzing only arepresentative training set of videos is more computationally efficientthan analyzing the entire set, though in other embodiments it is alsopossible to analyze all videos. The training set can be selected basedon various filtering criteria. These filtering criteria include a numberof views, number of viewers, number of unique views, date of views, dateof upload and so forth. The filtering criteria can be used in anycombination. For example, the training set can be established as the Nvideos (e.g., N=1000) which have been viewed at least K times (e.g.,K=1,000,000) in the previous M (e.g., M=15) days, and which are at leastT seconds (e.g., 30 seconds) in length. Here, K, M, N, and T are designdecisions selected by the system administrator. The most recently viewedvideos, or the videos viewed over a certain time period, can bedetermined by examining the start and stop dates and times of the videoaccess log 160, for example. A video can be deemed to be “viewed” if itis watched for a minimum length of time, or a minimum percentage of itstotal time.

With the training set of videos identified, the process of correlatingvideo data (e.g. feature vectors representing the images of the video)with demographic data performs two independent operations, which may beperformed in parallel: creation of demographic database 310 andextraction of video data 320. Based on the results of these operations,correlation of the demographic and video data can be performed. Theseprocesses are repeated for each video in the training set.

During distribution creation 310, the demographics are first extracted311 from the viewer profiles associated with a given video. This entailsidentifying the viewers specified in the video access log 160 as havingwatched the given video within the relevant time period or number ofviewings, retrieving their associated viewer profiles in the viewerprofile repository 105, and retrieving the demographic attributes ofinterest from the identified viewer profiles. Those viewer profileslacking the demographics attributes of interest may be excluded fromdemographic creation, or they may be considered as “unspecified” entrieswith respect to those attributes, for example. For example, if age andgender are the attributes of interest, then all viewer profiles havingthese attributes are examined, and those viewer profiles for which theattributes are not specified are not examined. Attributes may also befiltered to discard those that appear to be inaccurate. For example, ageattributes below or above a certain threshold age, e.g. under the age of3 or over the age of 110, could be discarded on the assumption that itis unlikely that a person of that age would genuinely be a viewer.

Demographic distributions are then created 312 based on the extractedattributes. As previously noted, data representing continuous valuessuch as age or income can be segregated into bins. The range for eachbin for a given attribute can be varied as desired for the degree ofgranularity of interest. The distribution data may be stored indifferent types of data structures, such as an array, with the value ofan array element being derivable from the array index. Valuesrepresenting discrete unrelated values, such as location or level ofeducation, can be stored in an arbitrary order, with one value perelement. Each attribute bin stores a count for the number of values inthe bin from the viewer profiles. Once all the relevant attributes havebeen factored into their corresponding distributions, the result is aset of distributions, one per video, for every relevant attribute and/orcombinations thereof. As mentioned above, these distributions includeage distribution, gender distribution, income distribution, educationdistribution, location distribution, and the like. Any of these can becombined into multi-attribute distributions, e.g., age-gender, orage-income, or gender-location.

Independently of the distribution creation 310, the video contentanalysis module 255 extracts 320 video data from each video in thetraining set of videos, representing the data as a set of “featurevectors.” A feature vector quantitatively describes a visual (orauditory) aspect of the video. Different embodiments analyze either orboth of these categories of aspects.

In general, feature vectors are associated with frames of the video. Inone embodiment, the feature vectors are associated not merely with acertain frame, but with particular visual objects within that frame. Insuch an embodiment, when extracting data relating to visual aspects, thevideo content analysis module 255 performs 321 object segmentation on avideo, resulting in a set of visually distinct objects for the video.Object segmentation preferably identifies objects that would beconsidered foreground objects, rather than background objects. Forexample, for a video about life in the Antarctic, the objects picked outas part of the segmentation process could include regions correspondingto penguins, polar bears, boats, and the like, though the objects neednot actually be identified as such by name.

Different object segmentation algorithms may be employed in differentembodiments, such as adaptive background subtraction, spatial andtemporal segmentation with clustering algorithms, and other algorithmsknown to those of skill in the art. In one embodiment, a mean shiftalgorithm is used, which employs clustering within a single image frameof a video. In segmentation based on the mean shift algorithm, an imageis converted into tokens, e.g. by converting each pixel of the imageinto a corresponding value, such as color value, gradient value, texturemeasurement value, etc. Then windows are positioned uniformly around thedata, and for each window the centroid—the mean location of the datavalues in the window—is computed, and each window re-centered aroundthat point. This is repeated until the windows converge, i.e. a localcenter is found. The data traversed by windows that converged to thesame point are then clustered together, producing a set of separateimage regions. In the case of a video, the same or similar image regionstypically exist across video frames, e.g. a region representing the sameface at the same location across a number of frames, or at slightlyoffset locations. In this case, one of the set of similar regions can bechosen as representative and the rest discarded, or the data associatedwith the images may be averaged.

The result of application of a segmentation algorithm to a video is aset of distinct objects, each occupying one of the regions found by thesegmentation algorithm. Since different segmentation algorithms—ordifferently parameterized versions of the same algorithm—tend to producenon-identical results, in one embodiment multiple segmentationalgorithms are used, and objects that are sufficiently common across allthe segmentation algorithm results sets are retained as representingvalid objects. An object segmented by one algorithm could be consideredthe same as that of segmented by another algorithm if it occupiessubstantially the same region of the image content object as the othersegmented object, e.g. having N % of its pixels in common, where N canbe, for example, 90% or more; a higher value of N results in a greaterassurance that the same object was identified by the differentalgorithms. The object could be considered sufficiently common if it isthe same as objects in the result sets of all the other segmentationalgorithms, or a majority or a set number or percentage thereof.

Characteristics are extracted 322 from content of the video. In oneembodiment, the characteristics are represented as feature vectors,lists of data pertaining to various attributes, such as color (e.g. RGB,HSV, and LAB color spaces), texture (as represented by Gabor and Haarwavelets), edge direction, motion, optical flow, luminosity, transformdata, and the like. In different embodiments, a given frame (or objectof a frame) may be represented by one feature vector, or by a number offeature vectors corresponding to different portions of the frame/object,e.g. to points at which there is a sharp change between color values, ordifferent attributes. In any case, the extracted feature vectors arethen stored within the feature vector repository 215 in association withthe video to which they correspond.

Some embodiments create feature vectors for audio features, instead ofor in addition to video features. For example, audio samples can betaken periodically over a chosen interval. As a more specific example,the mel-frequency cepstrum coefficients (MFCCs) can be computed at 10millisecond intervals over a duration of 30 seconds, starting after asuitable delay from the beginning of the video, e.g. 5 seconds. Theresulting MFCCs may then be averaged or aggregated across the 30 secondsampling period, and are stored in the feature vector repository 215.Feature vectors can also be derived based on beat, pitch, or discretewavelet outputs, or from speech recognition output or music/speakeridentification systems.

Some embodiments create feature vectors based on metadata associatedwith the video. Such metadata can include, for example, video title,video description, date of video uploading, the user who uploaded, textof a video comment, a number of comments, a rating or the number ofratings, a number of views by users, user co-views of the video, userkeywords or tags for the video, and the like.

The feature vector data when extracted are frequently not in an idealstate, containing a large number of feature vectors, some of which areirrelevant, adding no additional information. The potentially largenumber and low quality of the feature vectors increases thecomputational cost and reduces the accuracy of later techniques thatanalyze the feature vectors. In order to reduce the size and improve thequality of the feature vector data, the video content analysis module255 therefore performs 323 dimensionality reduction. Differentembodiments may employ different algorithms for this purpose, includingprincipal component analysis (PCA), linear discriminant analysis (LDA),multi-dimensional scaling (MDS), Isomap, locally linear embedding (LLE),and other similar algorithms known to those of skill in the art. Theresult of application of a dimensionality reduction algorithm to a firstset of feature vectors is a second, smaller set of vectorsrepresentative of the first set, which can replace their prior,non-reduced versions in the feature vector repository 215.

With the demographic database 210 and feature vector repository 215populated with data as a result of steps 310 and 320, respectively, thecorrelation module 260 correlates 330 (i.e., forms some associationbetween) the demographics and the video content as represented by thefeature vectors, creating as output a prediction model 220 thatrepresents all videos in the training set. The correlation is performedbased on machine learning techniques, such as supervised algorithms suchas support vector machines (SVM), boosting, nearest neighbor, ordecision tree, semi-supervised algorithms such as transductive learning,or unsupervised learning, such as clustering. In one embodiment, SVMkernel logistic regression techniques are employed.

Regardless of the particular algorithm employed, the output is apredicted distribution for the demographic categories in question, andis stored as a prediction model 220. In the case of a demographiccategory such as age that can be represented with a continuousdistribution function, the distribution can be stored as a set ofdiscrete values, e.g. a probability for each year in an agedistribution, thus creating a discrete approximation of a continuousdistribution. Alternately, coefficients of an equation generating afunction representing the distribution can be stored. For demographiccategories inherently having discrete values, such as gender orlocation, a set of probabilities may be provided, one per value, forexample. Thus, given a set of feature vectors representing a video, theprediction model 220 will have a set of corresponding predicteddistributions for various demographic attributes.

For example, one prediction model storing data for the age demographicattribute could be as in the below table, where each of the three rowsrepresents a set of feature vectors and their corresponding agedistribution for ages 13-17, 18-21, etc. It is appreciated that such atable is merely for purposes of example, and that a typicalimplementation would have much additional data for more sets of featurevectors, a greater number and granularity of ages, more demographicattributes or combinations thereof, and the like.

Feature vectors 13-17 18-21 22-25 26-30 . . . F1, F2, F3 10% 18% 32% 19%. . . F4, F5 15% 22% 38% 16% . . . F6, F7 30% 20% 10%  5% . . .

Applications of the Prediction Model

The video hosting website 100 provides a number of different usagescenarios. One usage scenario is prediction of demographic attributevalues for a video, such as newly submitted video. In this scenario, avideo that has not been previously classified for its demographicattributes is received. This can be a video that has been previouslyuploaded to the video hosting website 100, or a video that is currentlyin the process of being uploaded. This video's visual and/or audiofeature vectors are extracted by the feature extraction module 255.Then, the extracted feature vectors are matched against those of theprediction model 220, and a set of feature vectors are identified thatprovide the closest match, each feature vector having a match strength.In one embodiment, the match strength is determined by use of a measurematrix. In one embodiment, the prediction model uses a predefinedsimilarity measure, e.g. Gaussian kernel between pairs of featurevectors. In one embodiment, only one closest feature vector isidentified—i.e. the set contains only one feature vector—and thecorresponding demographic distributions for the demographic attributesin question are retrieved from the prediction model 220. In anotherembodiment, the set may contain multiple feature vectors, in which casethe demographic distributions may be linearly combined, with therespective match strengths providing the combination weightings. Inanother embodiment, the set of feature vectors as a whole is used tolook up corresponding demographic distributions in the prediction model220. For example, if the age and gender demographic categories are ofinterest, then for a given video, predicted distributions could beproduced that comprise probabilities that viewers of the video would bein the various possible ages and of the male and female genders. Theability to obtain predicted demographic distributions with respect to agiven video has various useful applications.

A second usage scenario, related to the first scenario, is to identifytop demographic values of an attribute of interest for which a new videowould be likely be relevant. For example, when a video is analyzed theprobabilities that a viewer would be of the various ages within the agedemographic category could be computed as in the first scenario, theprobabilities sorted, and a determination made that the video appealsmost strongly to people of the age range(s) with the top probability,e.g. 13-15 year olds.

A third usage scenario is to determine likely demographic valuesassociated with a viewer who either lacks a viewer profile, or whoseviewer profile is untrustworthy (e.g., indicates an improbableattribute, such as being above age 110). In this application, theviewer's previously-watched videos are identified by examining the videoaccess log 160 for the videos retrieved by the same IP address as theviewer. From this list of videos one or more videos are selected, andtheir feature vectors retrieved from the feature vector repository 215(if present) or their feature vectors are extracted by the video contentanalysis module 255. The resulting feature vectors are then input intothe prediction model 220 to obtain the predicted demographics for eachvideo. To estimate the viewer's demographic, the demographic strengthsfor each video watched by that viewer can be combined, such as byaveraging the demographics for each video, by averaging that includesweighting the demographics for the videos according to how frequentlythe respective videos were watched by that viewer, and the like. As aresult, combined probabilities can be computed for each demographiccategory, and a top value or values chosen in each, e.g. 21 as the agevalue, and male as the gender value, representing that the viewer isbelieved to most probably be a 21 year old male.

Another usage scenario is to predict, for a given set of demographicattribute values, what videos would be of interest to viewers with suchdemographics. This is useful, for example, to create a list ofrecommended videos for such a viewer. This scenario involves furtherprocessing of the demographic probability data to identify thetop-scoring videos for a given demographic value, and the processed datacan then be used as one factor for identifying what videos may be ofinterest to a given viewer. For example, when a new video is submitted,the video demographics analysis server 130 computes a set of demographicvalues having the highest match probabilities for the video forcategories of interest. For instance, for a video containing contentrelated to social security benefits, the highest value for the gendercategory might be female with match strength 0.7, the highest attributevalues for the age category might be 60, 62, 63, 55, and 65, withrespective match strengths 0.8, 0.7, 0.75, 0.85, and 0.8, and thehighest attribute values for the gender-age combination category mightbe female/60 and female/62, with respective match probabilities 0.95 and0.9. These computed demographic probabilities can be stored for eachvideo, e.g. as part of the video database 155, and a list of the videoswith the top scores for each demographic category attribute stored. Forexample, the top-scoring videos for people of age 41 might be a videotrailer for the film “Pride & Prejudice” and a video on landscaping, andthe top-scoring videos for males with college degrees might be a videoabout mortgage foreclosures and an instructional video on golf.

These lists of top videos for different demographics can then be appliedto identify recommendations for related videos. For example, if a vieweris viewing a video about the Antarctic with submitter-supplieddescription “Look at the cute penguins,” the video demographics analysisserver 130 can refer to his profile, determine that he is a male collegegraduate, and potentially recommend the videos on mortgage foreclosuresand golf instruction, based upon the videos associated with thesedemographics via the prediction model. These recommendations can be madein addition to those recommended based on other data, such as thekeyword “penguins,” keywords specified in the viewer's profile as beingof interest to that viewer, and the like. The demographics-derivedrecommendations can be displayed unconditionally, in addition to theother recommendations, or conditionally, based on comparisons ofcomputed relevance values, for example. Similarly, the variousrecommendations may be ordered according to computed relevance values,with each recommendation source—e.g. derived from demographics, or fromkeyword matches—possibly having its own particular formula for computinga relevance value.

Still another usage scenario is serving demographic queries, i.e.providing demographic information across videos. For example, a user(either a human or a program) could submit a query requesting theaverage age of the viewers across all the videos in the video database155, or some subset of these videos, the answer factoring in estimatedages of users who otherwise lack profiles. As another example, a usercould submit a query requesting the top 10 videos for women aged 55 orolder.

The present invention has been described in particular detail withrespect to one possible embodiment. Those of skill in the art willappreciate that the invention may be practiced in other embodiments.First, the particular naming of the components and variables,capitalization of terms, the attributes, data structures, or any otherprogramming or structural aspect is not mandatory or significant, andthe mechanisms that implement the invention or its features may havedifferent names, formats, or protocols. Also, the particular division offunctionality between the various system components described herein ismerely exemplary, and not mandatory; functions performed by a singlesystem component may instead be performed by multiple components, andfunctions performed by multiple components may instead be performed by asingle component.

Some portions of above description present the features of the presentinvention in terms of algorithms and symbolic representations ofoperations on information. These algorithmic descriptions andrepresentations are the means used by those skilled in the dataprocessing arts to most effectively convey the substance of their workto others skilled in the art. These operations, while describedfunctionally or logically, are understood to be implemented by computerprograms. Furthermore, it has also proven convenient at times, to referto these arrangements of operations as modules or by functional names,without loss of generality.

Unless specifically stated otherwise as apparent from the abovediscussion, it is appreciated that throughout the description,discussions utilizing terms such as “determining” or “displaying” or thelike, refer to the action and processes of a computer system, or similarelectronic computing device, that manipulates and transforms datarepresented as physical (electronic) quantities within the computersystem memories or registers or other such information storage,transmission or display devices.

Certain aspects of the present invention include process steps andinstructions described herein in the form of an algorithm. It should benoted that the process steps and instructions of the present inventioncould be embodied in software, firmware or hardware, and when embodiedin software, could be downloaded to reside on and be operated fromdifferent platforms used by real time network operating systems.

The present invention also relates to an apparatus for performing theoperations herein. This apparatus may be specially constructed for therequired purposes, or it may comprise a general-purpose computerselectively activated or reconfigured by a computer program stored on acomputer readable medium that can be accessed by the computer. Such acomputer program may be stored in a computer readable storage medium,such as, but is not limited to, any type of disk including floppy disks,optical disks, CD-ROMs, magnetic-optical disks, read-only memories(ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic oroptical cards, application specific integrated circuits (ASICs), or anytype of computer-readable storage medium suitable for storing electronicinstructions, and each coupled to a computer system bus. Furthermore,the computers referred to in the specification may include a singleprocessor or may be architectures employing multiple processor designsfor increased computing capability.

The algorithms and operations presented herein are not inherentlyrelated to any particular computer or other apparatus. Variousgeneral-purpose systems may also be used with programs in accordancewith the teachings herein, or it may prove convenient to construct morespecialized apparatus to perform the required method steps. The requiredstructure for a variety of these systems will be apparent to those ofskill in the art, along with equivalent variations. In addition, thepresent invention is not described with reference to any particularprogramming language. It is appreciated that a variety of programminglanguages may be used to implement the teachings of the presentinvention as described herein, and any references to specific languagesare provided for invention of enablement and best mode of the presentinvention.

The present invention is well suited to a wide variety of computernetwork systems over numerous topologies. Within this field, theconfiguration and management of large networks comprise storage devicesand computers that are communicatively coupled to dissimilar computersand storage devices over a network, such as the Internet.

Finally, it should be noted that the language used in the specificationhas been principally selected for readability and instructionalpurposes, and may not have been selected to delineate or circumscribethe inventive subject matter. Accordingly, the disclosure of the presentinvention is intended to be illustrative, but not limiting, of the scopeof the invention, which is set forth in the following claims.

1. A computer-implemented method of generating a prediction model forvideos, comprising: receiving a plurality of videos from a videorepository, each video having an associated list of viewers; for eachvideo, creating a demographic distribution for at least one demographicattribute based at least in part on viewer demographic data associatedwith viewers of the video; for each video, generating feature vectorsbased at least in part on the content of the video; generating aprediction model that correlates the feature vectors for the videos andthe demographic distributions; and storing the prediction model.
 2. Thecomputer-implemented method of claim 1, wherein the demographicattribute is one of age and gender.
 3. The computer-implemented methodof claim 1, wherein the demographic attribute is one of occupation,household income, and location.
 4. The computer-implemented method ofclaim 1, wherein the prediction model is generated using support vectormachines.
 5. The computer-implemented method of claim 1, furthercomprising altering the generated feature vectors using a dimensionalityreduction algorithm.
 6. The computer-implemented method of claim 1,wherein the generated feature vectors include features vectors generatedbased on audio content of the video and vectors generated based onvisual content of the video.
 7. The computer-implemented method of claim1, wherein the feature vectors are generated at least in part onmetadata associated with the video.
 8. The computer-implemented methodof claim 1, further comprising: performing object segmentation on aframe of the video, thereby identifying a visual object of the frame;wherein generating feature vectors based at least in part on the contentof the video comprises generating feature vectors for the identifiedvisual object.
 9. A computer-implemented method for determiningdemographics of a video, comprising: storing a prediction model thatcorrelates viewer demographic attributes with feature vectors extractedfrom videos viewed by viewers, wherein the viewer demographic attributesinclude age and gender; receiving a video; generating from content ofthe video a set of feature vectors; and identifying demographicattribute values by applying the prediction model to the generated setof feature vectors.
 10. The computer-implemented method of claim 8,wherein identifying demographic attribute values comprises: identifyinga set of feature vectors of the prediction model that is most similar tothe generated set of feature vectors; and identifying, in the predictionmodel, demographic attribute values most strongly correlated with theidentified feature vectors.
 11. A computer-implemented method foridentifying demographics associated with a viewer, comprising: storing aprediction model that correlates viewer demographic attributes withfeature vectors generated from videos viewed by viewers; identifying aset of videos viewed by a given viewer; generating, from content of theset of videos, feature vectors; applying the feature vectors to theprediction model to identify viewer demographic attribute values moststrongly correlated with the feature vectors of the prediction model;and identifying viewer demographic attribute values most stronglycorrelated with the given viewer based at least in part on theidentified viewer demographic attribute values.
 12. Acomputer-implemented method for identifying videos associated with givendemographic attribute values, comprising: storing a prediction modelthat correlates viewer demographic attributes with feature vectorsgenerated from videos viewed by viewers; receiving a plurality ofvideos; for each video of the plurality of videos: generating featurevectors from the video; applying the feature vectors generated from thevideo to the prediction model to identify viewer demographic attributevalues most strongly correlated with the feature vectors of theprediction model; storing the identified viewer demographic attributevalues in association with the video; selecting videos having highestvalues for the given demographic attribute values; and displayingidentifiers of the selected videos.
 13. A computer readable storagemedium storing a computer program executable by a processor forgenerating a prediction model for videos, the actions of the computerprogram comprising: receiving a plurality of videos from a videorepository, each video having an associated list of viewers; for eachvideo, creating a demographic distribution for at least one demographicattribute based at least in part on viewer demographic data associatedwith viewers of the video; for each video, generating feature vectorsbased at least in part on the content of the video; generating aprediction model that correlates the feature vectors for the videos andthe demographic distributions; and storing the prediction model.
 14. Thecomputer readable storage medium of claim 12, wherein the generatedfeature vectors include features vectors generated based on audiocontent of the video and vectors generated based on visual content ofthe video.
 15. The computer readable storage medium of claim 12, whereinthe prediction model is generated using support vector machines
 16. Acomputer system for generating a prediction model for videos,comprising: a video repository storing a plurality of videos, each videohaving an associated list of viewers; a video analysis server adaptedto: receive a plurality of videos from the video repository; for eachvideo, create a demographic distribution for at least one demographicattribute based at least in part on viewer demographic data associatedwith viewers of the video; for each video, generate feature vectorsbased at least in part on the content of the video; generate aprediction model that correlates the feature vectors for the videos andthe demographic distributions; and store the prediction model.
 17. Thecomputer system of claim 16, wherein the demographic attribute is one ofage and gender.
 18. The computer system of claim 16, wherein theprediction model is generated using support vector machines.
 19. Thecomputer system of claim 16, wherein the generated feature vectorsinclude features vectors generated based on audio content of the videoand vectors generated based on visual content of the video.
 20. Thecomputer system of claim 16, wherein the feature vectors are generatedat least in part on metadata associated with the video.